Composite

Part:BBa_K4701306:Design

Designed by: Henri Sundquist   Group: iGEM23_Aalto-Helsinki   (2023-08-10)


Synthetic tpaK-tphII operon (High transcription)


Assembly Compatibility:
  • 10
    COMPATIBLE WITH RFC[10]
  • 12
    INCOMPATIBLE WITH RFC[12]
    Illegal NheI site found at 7
    Illegal NheI site found at 30
  • 21
    INCOMPATIBLE WITH RFC[21]
    Illegal XhoI site found at 5244
  • 23
    COMPATIBLE WITH RFC[23]
  • 25
    INCOMPATIBLE WITH RFC[25]
    Illegal NgoMIV site found at 5343
  • 1000
    INCOMPATIBLE WITH RFC[1000]
    Illegal BsaI.rc site found at 5334
    Illegal SapI site found at 5291

Note

Here we outline the final design of the synthetic tpaK-tph operon. The further details of the iterative design process are outlined in our wiki and not here, as they don’t necessarily concern the final design. Furthermore, the cloning strategy is discussed, but the choice of pSEVA231 as the backbone was born out of necessity, and the justifications are again left to the wiki. As the promoter in this design is constitutively on, we additionally created two other variants with different promoters. This part has the strongest promoter, with BBa_K4701307 having a medium promoter and BBa_K4710308 being the variant with the weakest promoter.

Introduction

The part is intended to allow TPA utilization in Pseudomonas putida KT2440, and the choices of the genes were inspired by a previous study [1]. Generally speaking, the operon is a combination of the TPA transporter protein tpaK from Rhodococcus jostii as it was used in the aforementioned study successfully, albeit in a separate transcript from the rest of the genes. With TPA uptake enabled, the metabolism itself is driven by the catalytic genes from the tph operon of Comamonas sp. E6 [2]. Figure 1 shows the overall structure of the operon.

Sketch of the different parts of the cassette
Figure 1: Sketch of the overall operon design. The 5’ end of the transcript includes a spacer sequence followed by a stabilizing hairpin. Next the five genes are placed back to back with a 25 bp RBS between each ORF.

Next we discuss each of the components, justify the choices and show how biophysical models were used to rationally design the part.

The tph operon and the TPA transporter

Comamonas sp. E6 contains a known TPA assimilation pathway enabled by two highly similar operons: tphI and tphII. Importantly, either operon by themselves is sufficient to enable TPA utilization [2], for the rest of this text, all references to the tph operon or the tph genes specifically refer to the tphII operon.

The full tphII operon includes six genes tphRIICIIA2IIA3IIBIIA1II with tphRII being transcribed separately and having a regulatory function in the presence of TPA. As we intend our modified P. putida to grow in an environment with abundant TPA, we won’t overcomplicate the design with tphRII. tphCII is the native TPA transport protein. However, in a recent study, no growth was observed when using tphCII in P. putida [1]. Instead tpaK from R. jostii was used successfully when expressed separately from the rest of the operon, here we attempt to simply swap out tphCII with tpaK to create a modular design. As for the catabolic genes tphA2IIA3IIBIIA1II, we keep their order the same, as we have no reason to do otherwise and the ordering might be relevant to efficient co-folding of the TPADO complex. The spacing and sizes of the catabolic genes in Comamonas sp. E6 are shown in figure 2.

Structure of the wild-type tph operon from Comamonas sp. E6
Figure 2: The catabolic tph operon from Comamonas sp. E6. (GenBank: AB238679). 3’ end of tphA2II overlaps with 5’ of tphA3II while its 3’ overlaps with the 5’ end of tphBII. tphBII is separated from tphA1II by a 9 bp gap.

Sequences

Nucleotide sequences for tphA2IIA3IIBIIA1II were acquired from Comamonas sp. E6 (GenBank: AB238679). The sequence for tpaK was acquired from the full sequence of the pRHL2 plasmid from R. jostii RHA1 (GenBank: CP000433). Codons were optimized with the IDT codon optimization tool using the Pseudomonas putida codon usage tables such that all rare codons (frequency < 0.1) were eliminated. Furthermore, some manual tweaking was done to balance translation rates (see below), weaken alternative transcription start sites, and to remove restriction sites used in some of the BioBrick assembly standards.

Promoter

The promoter used is BBa_J23102 from the Anderson promoter collection. This collection of promoters was benchmarked in Pseudomonas putida KT2440 in a recent paper [3], with BBa_J23102 showing the highest transcription rates. Note that there are alternative versions of this part with a more moderate and low predicted transcription rates.

Spacer sequence and the insulating hairpin

In the design, all five genes are expressed together on a single mRNA transcript, meaning that overall mRNA stability is important to allow the translation of all five ORFs. Furthermore, we want the mRNA structure around the first ribosome binding site to remain stable to allow for a constant translation initiation rate. This is important as the translation rates of downstream ORFs are affected by the translation rates of upstream ORFs via translational coupling [4]. High translation rates are also important for mRNA stability, as elongating ribosomes shield the transcript from RNAses [5]. Lastly, striving for transcript stability will allow us to more reliably adjust overall protein levels by switching the promoter.

While rules in sequence design are obviously not black and white, we decided to attempt using the spacer sequence between the promoter and the first RBS for something useful, as some sequence is needed regardless.

Many bacterial pathways for mRNA degradation are dependent on the 5’ UTR sequence of the transcript [5]. It has also been observed that the leading trinucleotide has an effect on the rate at which the 5’ end of an mRNA transcript is hydrolyzed, which acts as a starting point for many mRNA degradation pathways. ATG was observed to be the most stable leading trinucleotide, thus we follow the promoter with ATGATG.

This sequence is followed by CGAC, forming the SaII restriction site GTCGAC. This is useful as it gives us the ability to remove the promoter from our fragments down the line if the need arises to swap the promoter to something else, perhaps an inducible one. Furthermore, SaII is not a part of the BioBrick assembly standards.

Next, we introduce an insulating hairpin, which is intended to keep the mRNA structure around the first RBS stable as discussed above. Generally speaking, the mRNA folding affects translation initiation rates by affecting the ΔG of ribosome binding [6]. The sequence for the insulating hairpin was designed iteratively with Vienna RNAfold [7]. Figure 3 shows the details around the spacer sequence and the first RBS. Figure 4 further shows predicted transcription start sites, showing that the insulating hairpin is predicted to be included in the majority of transcripts.

Spacer sequence as predicted by ViennaFold
Figure 3: Predicted centroid secondary structure of the mRNA transcript around the first RBS. We can see that the insulating hairpin forms upstream of the first RBS which itself is predicted to stay in a linear form allowing for easy ribosome binding. Different parts of the RBS are annotated as used in the biophysical model behind the RBS calculator [6].
Predicted transcription start rates per base
Figure 4: Transcription rates by base as predicted by the Promoter calculator [8]. Note that currently the model is only available for E. coli, but as it relies on modeling the biophysics between the sequence and the RNAP/σ70 complex, the predictions should in theory be applicable to P. putida KT2440 too. The sequence upstream of the promoter is not a part of BBa_K4701306, but is a result of our cloning strategy.

Terminator

A recent study benchmarked various terminator sequences in P. putida KT2440.9 The most effective terminator, named “T1” in the paper, has been previously added as part BBa_K3675003, and we include it in our design. The inclusion of a terminator is important as we want the part to act as a standalone cassette, and also because there doesn’t appear to be data on how effective the T0 terminator downstream of our insert in pSEVA231 is in P. putida [10].

Ribosome binding sites and operon optimization using biophysical models

The design of the ribosome binding sites and the flanking sequences is perhaps the most important part when it comes to the overall operon design as we want all ORFs to be translated on a sufficient level. RBS sequences in P. putida generally aren’t as well characterized as in model organisms like E. coli [10]. However, biophysical models have been developed to design RBS sequences and to predict translation rates. Briefly, the RBS calculator [6], predicts mRNA secondary structure around the RBS sequence, and calculates the total ΔG of ribosome binding. The general idea then is that the larger the decrease in the amount of free energy in ribosome binding is, the more this binding will spontaneously happen, directly affecting translation initiation rates.

However, RBS sequences in operons cannot be designed in isolation, as the standby sequence upstream of the Shine-Dalgarno sequence together with the 3’ end of the upstream gene affects mRNA secondary structure. Furthermore, translation rates of upstream ORFs affect the translation initiation rates of downstream ORFs via translational coupling. The operon calculator [4] is a biophysical model that takes these considerations into account, and can be used to predict translation rates in synthetic operons.

Figures 5 and 6 show the predicted translation and transcription rates respectively. The final range in translation rates is roughly between 27750 to 8070 on the relative arbitrary unit scale. It can also be seen that transcription rates are overwhelmingly predicted to proceed in the forward direction, and from the beginning of the operon. These results were gained after many rounds of iterations, the more detailed process is described in our wiki.

Predicted translation rates of the five ORFs
Figure 5: Predicted translation rates of the final construct. The range is between 27750 for tpaK to around 8070 for tphA1II.
Predicted transcription start sites over the whole operon
Figure 6: Predicted transcription rates of the BBa_K4701306, with promoter J23102.

All in all, many modifications were made to the sequence over the ten or so design cycles. Briefly, intrinsic terminators called by the operon calculator were removed. Alternative Shine-Dalgarno-like sequences were removed in both strands to minimize the translation of alternative ORFs. Sequences resembling the -35 promoter element or the Pridnow Box were eliminated to minimize alternative transcripts in both directions. Repeats were minimized to make sure fragment synthesis would be successful. The RBS sequences and their flanking sequences were tweaked many times to achieve as uniformly predicted translation rates as possible. All of this was achieved using silent mutations while taking care to keep codons moderately optimized and keeping the sequence compliant with assembly standards RFC 10, and RFC 23.

Gibson fragments

As the final length of the synthetic operon is 5472bp, it is too long to be synthesized in one go. Using restriction cloning to combine multiple fragments is rather clumsy, and as we had had issues with the technique with our other inserts, we opted to use another more suitable assembly method, known as Gibson assembly.

As the DNA fragments would be ordered from Twist Bioscience, which at the time of writing can synthesize fragments up to 1700bp long, we needed to split our operon into four fragments. We planned to use the Gibson Assembly HiFi Cloning Kit from our sponsor ThermoFisher Scientific for assembly, and thus we turned to their manuals when designing the assembly fragments. Generally speaking, the kit can be used to combine the four fragments with the linearized pSEVA231 vector in one reaction given that the fragments share appropriate 40 bp overlaps. The overlaps should contain no GC-extremities, tandem repeats, or strong secondary structures. As Gibson assembly with our kit is done at 50 °C, melting temperatures below this for ssDNA should prevent secondary structures from causing issues. Secondary structures formed by the overlaps were analyzed with the IDT-provided interface to UnaFold [11]. Potential self-dimers were screened for with the IDT OligoAnalyzer tool.

Table 1 shows the chosen gibson overlap sequences, while figure 7 shows how the fragments map to the construct.

Overlap Sequence Range GC-content Tandem repeats Tm
PacI/1 TCTTTCGACTGAGCCTTTCGTTTTATTTGATGCCTTTAAT 1-40 35% No 30.3°C
1/2 AGTTCTACGCCCTGCAAAGCTGGTTGCCGTCCATCATGAC 949-988 55% No 43.5°C
2/3 TAACCCTCCAGATCCTCTCGGTATTCCCCGGTTTCGTCCT 2390-2429 55% No 43.6°C
3/4 TACGTCGCAATGCTGCACGATCAGGGTCACATTCCTATCA 4038-4077 50% No 39.1°C
4/SpeI CTAGTCTTGGACTCCTGTTGATAGATCCAGTAATGACCTC 5433-5472 45% No 44.8°C
Table 1: Gibson overlaps chosen for the assembly with pSEVA231 linearized with PacI and SpeI. Note that the first and last overlaps are not a part of BBa_K471306 as they are dependent on the host vector. Thus the indexing is also with respect to these extra overlaps.
Mapping of Gibson fragments
Figure 7: Overview of the construct and correspondence to Gibson fragments used in assembly. See the design page for a description of each part and our wiki for the design process.

For more information on the design process, refer to the part page on our wiki.

References

[1] Wernerm, AZ. et al. (2021) Tandem chemical deconstruction and biological upcycling of poly(ethylene terephthalate) to β-ketoadipic acid by Pseudomonas putida KT2440. Metabolic Engineering. 67, 250–261. https://doi.org/10.1016/j.ymben.2021.07.005

[2] Sasoh, M. et al. (2006) Characterization of the Terephthalate Degradation Genes of Comamonas sp. Strain E6. Applied and Environmental Microbiology. 72(3), 1825–1832. https://doi.org/10.1128/AEM.72.3.1825-1832.2006

[3] Pearson, AN. et al. (2023) The pGinger Family of Expression Plasmids Bond DR, editor. Microbiology Spectrum. 11(3). http://doi.org/10.1128/spectrum.00373-23

[4] Tian, T, Salis, HM. et al. (2015) A predictive biophysical model of translational coupling to coordinate and control protein expression in bacterial operons. Nucleic Acids Research. 43(14), 7137–7151. https://doi.org/10.1093/nar/gkv635

[5] Cetnar, DP, Salis, HM. et al. (2021) Systematic Quantification of Sequence and Structural Determinants Controlling mRNA stability in Bacterial Operons. ACS Synthetic Biology. 10(2), 318–332. https://doi.org/10.1021/acssynbio.0c00471

[6] Salis, HM. et al. (2009) Automated design of synthetic ribosome binding sites to control protein expression. Nature Biotechnology. 27(10), 946–950. https://doi.org/10.1038/nbt.1568

[7] Lorenz, R. et al. (2011) ViennaRNA Package 2.0. Algorithms for Molecular Biology. 6(1), 26. https:/doi.org/10.1186/1748-7188-6-26

[8] LaFleur TL, Hossain A, Salis HM. et al. (2022) Automated model-predictive design of synthetic promoters to control transcriptional profiles in bacteria. Nature Communications. 13(1), 5159. http://doi.org/10.1038/s41467-022-32829-5

[9] Amarelle, V. et al. (2019) Expanding the Toolbox of Broad Host-Range Transcriptional Terminators for Proteobacteria through Metagenomics. ACS Synthetic Biology. 8(4), 647–654. https://doi.org/10.1021/acssynbio.8b00507

[10] Martin-Pascual, M. et al. (2021) A navigation guide of synthetic biology tools for Pseudomonas putida. Biotechnology Advances. 49, 107732. https://doi.org/10.1016/j.biotechadv.2021.107732

[11] Markham NR, Zuker M. UNAFold. (2008) UNAFold: software for nucleic acid folding and hybridization. Methods in Molecular Biology. 453, 3-31. https://doi.org/10.1007/978-1-60327-429-6_1